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ENZYMATIC SYNTHESIS OF OLIGONUCLEOTIDE TAGS 

Field of the Invention 

5 The invention relates generally to methods for synthesizing collections of minimally 

cross-h>'bridizing oligonucleotide tags for identifying, sorting, and/or traddng molecules, 
especially pohnucleotides. 

BACKGROUND 

1 0 Specific hybridization of oligonucleotides and their analogs is a fundamoitai process 

that is miployed in a wide variety of research, medical and industrial applications, including 
the idaitification of disease-related polynucleotides in diagnostic assays, screening for clones 
of novel target polynucleotides, idraitification of specific polynucleotides in blots of mixtures 
of polynucleotides, amplification of specific target polynucleotides, therapeutic blocking of 

1 5 inappropriately expressed genes, DNA sequaidng, and the like, e.g. Sambrook ct al, 

Molecular Cloning: A Laboratory Manual 2nd Edition (Cold Spring Harbor Laboratory, New^ 
York. 1989). Keller and Manak, DNA Probes, 2nd Edition (Stockton Press. New York, 1993): 
Milligan et al, J. Med. Chem,, 36: 1923-1937 (1993); Drmanac et al. Science. 260: 1649-1652 

(1993) : Bains, J. DNA Sequaicing and Mapping, 4: 143-150 (1993). 

20 Specific hybridization has also been proposed as a method of tracking, retrieving, and 

identifying compounds labeled with oligonucleotide tags, e.g. Brenner. International 
application PCT/US95/12791; Church et al. Science, 240: 185-188 (1988); Brenner and 
Lemer. Proc. Nad. Acad. Sci.. 89: 5381-5383 (1992): Alper, Science. 264: 1399-1401 

(1994) : Cheverin et al Biotechnolpg>\ 12: 1093-1099 (1994): and Needels et al, Proc. Natl. 
25 Acad. Sci.. 90: 10700-10704 (1993). The successfiil implementation of such tagging and 

sorting schemes dq>end5 in large part on the success in achieving specific hybridization 
between a tag and its complanent. That is. for an oligonucleotide tag to successfiiUy idaitify a 
substance, the number of felse positive and felse negative signals brought about by incorrect 
hybridizations must be minimized. And for oligcouicleotide tags to effectively sort molecules. 

30 the number of tags hybridized to complements at incorrect sites must be minimiz ed. 

Unfortunately, incorrect hybridizations brought about by the creation of stable duplexes 
containing mismatches are not uncommon because base pairing and base stacking fiee energies 
van widely among nucleotides in a duplex or triplex structure. For example, a duplex 
consisting of a repeated sequence of deoxyadmosine (A) and diymidine (T) bound to its 

35 complement may have less stabilit>' than an equal-length duplex consisting of a repeated 

sequence of deoxyguanosine (G) and deoxycytidine (C) bound to a partially complementarv* 
target containing a mismatch. Thus, if a desired compound fi-om a large combinatorial 
chemical librar\* were tagged with the former oligonucleotide, a significant possibility would 

- I - 
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exist that, under hybridization conditions designed to detect perfectly matched AT-rich 
duplexes, undesiied compounds labded with the GC-ridi oligonucleotide-cven in a 
mismatched duplex-would be detected or sorted along with the perfectly matched diq)lexes 
consisting of tiie AT-rich tag. Even thou^ reagents, sudi as tetrameth>iammoniiun diloride, 
5 are available to n^ate base-specific stability differences of ohgonudeotide diq)lexes, the effect 
of such reagents is often limited and their presence can be incompatible widi, or render more 
difiicuh, further manq)ulations of the selected compoimds, e.g. anq)lificatLon by polymerase 
chain reaction (PCR), or the like. 

Such problons have been addressed in the "solid phase" cloning technique, described 

10 in Brenno-, International application PCTAJS95/12791, by the devdopmait of oligamcleotide 
tags synthesized combinatorially fitnn a set of so-called minimally cross^ybridizing 
oUgonudeotides, or "words." Tlie words, whidi are oligonucleotides usually 3 to 6 nudeotides 
in length, differ firam every oflier member of the same set by at least two nudeotides. Thus, a 
given word cannot fbnn a duplex with the complement of any other word of the set widiout 

15 less than two mismatches. Of course, minimaBy cross-hybridizing sets are preferably fonned 
ftom words differing fiom one another by evoi more than two nudeotides. 

In such a scheme, different cliganudeotide tags constructed fitun concatenations of 
sudi words will differ fiom one another by at least two nucleotides, or by at least flie number 
of nucleotides that their conqionent words differ by. Therefore, by judidously selecting word 

20 length, differraices between words m a set, and the number of words per tag, cme can obtain a 
large s^ or rq>ertoire, of oligonudeotide tags that each differ ftom one another by a 
significant percoitage of tiieir nudeotides. Such repertoires permit tagging and sorting of 
molecules with a nmdi hig^ degree of specificity than ordinary oligonudeotides. 

Unfbrtunatdy, current methods of solid phase synthesis, although highly effident, still 

25 lead to a significant fiacdon of failure sequences ^en oligonudeotide tags start to exceed 30 
to 40 nucleotides m length. The presoice of such fiiilure sequences can have a significant 
impact on soUd phase doning and sorting sdiemes, such as the one desoibed in Brenner (dted 
above). When tag conqilements are synthesized sepaiatdy ftom their corresponding 
ohgonudeotide tags, presence of d if ferent s^ of failure sequences amopg the two reaction 

30 products means that not every oligonudeotide ftom one reaction will necessarily have a 
conq)lementary oligonudeotide among products of the other reaction. In particular, &ilure 
sequences produced in one reaction will generally not have con^lementary failure sequences 
produced in the other reaction. While this is not a problem for tag con:q>lements 
combinatorially synthesized on solid phase supports because the number and kind of failures 

35 are randomly distributed among a population of predominantly correct-sequence 

oligcmucleotides, for tags attached to DNAs which are sampled and aii^)lified, a significant 
probability exists that if one or more of the sampled tags contain failure sequences, no soUd 
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phase supports will exist for tfiem that has a population of perfect complonoits. 
ConsequCTtly, DNAs with such tags cannot be effectively sorted. 

In view of the above, it would be useful if there were available a method of producing 
oligonucleotide tags which would avoid or minimize the chance of there being sampled and 
5 amplified tags that contain failure sequences. 

Summary of the Invention 
Accordingly, objectives of my invention include, but are not limited to, providing a 
method of synthesizing oligonucleotide tags which minimizes the production of failure 
1 0 sequences; providing an enzymatic method of synthesizing oligonucleotide tags by the 
combinatorial addition of words; providing a mediod of convergent synthesis of 
oUgonucleotide tags from error-free components; providing a method of constructing tag-DNA 
conjugates whose tags are free of failure sequences; providing compositions comprising novel 
oUgonucleotide tags. 

] 5 My invention achieves these and oflier objectives by providing a method of 

synthesizing oUgonucleotide tags that comprises successive cycles of cleavage of a 
oUgonucleotide tag precursor to permit the ligation of one or more words from a mmimally 
cross-hybridizing set, ligation of the one or more words, and ampUfication of ligated structure. 
Preferably, repertoires of oUgonucleotide tags of a predetermined length are assraibled firon 

20 words, or sub-assembUes of words, that are free of failure sequaices. Preferably, such error- 
free words or sub-assembUes of words are obtained dther by sq>arately s ynthesizing and 
sequencing individual words or sub-assanbUes of words prior to assembly, or by successive 
Ugations of adaptors having protruding strands consisting of word sequences that select 
complementary word sequences on the protruding strand of a growing tag. Preferably, in the 

25 foimer embodiment words or sub-assemblies of words are inserted into and maintained in 
conventional cloning vectors, after vAdcb they are sequenced to confirm that no errors are 
present. For use in the method oftheiiivaition, the words or sub-assembUes of words are 
excised from the vectors, mixed, and ligated to an oligonucleotide tag precursor. Preferabh', in 
the latter embodiment, error-containing words are exduded from the assembly process by 

30 requiring that the single stranded form of each added word anneal to a perfectly matdied 
complement ofan oligonucleotide tag precursor in a Ugationstq). If a mism a tch exists 
because a feilure sequaice is present in one of the strands, no Ugation will take place, eidier 
precluding further growth of the tag if the failure is carried by its protruding strand, or 
promoting the anneaUng of a different word if the failure is carried by the word being added. 

35 The invention further includes repertoires of oligonucleotide tags consisting of a 

pluraUty words wiierein at least two words of the pluraUty are separated by one or two 
nucleotides. 
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The present invention overcomes difficulties in sorting polynucleotides with 
oligonucleotide tags synthesized by currently available methods. By providing oligonucleotide 
tags free of failure sequences, sampled and aiiq}liiied tag-polynucleotide conjugates are 
assured of finding a tag con^)lement with which to form a perfectly matched duplex. 

5 

Brief Description of the Drawings 
Figure la illustrates a preferred embodiment of the invention in which oligonucleotide 
tags arc assembled by successive additions of one or more words to an oligonucleotide tag 
precursor. 

10 Figure lb ilhistrates a preferred embodiment of the invention in whidi oligonucleotide 

tags are assembled by convergent additions of increasingly larger sub-assmbUes of words. 

Figure 2 illustrates a preferred embodiment of the invention ^^dierein oligonucleotide 
tags are ass^ble by successive additions and self-selection of words to an oligonucleotide tag 
precursor. 

15 

Definitions 

As used bsrem, the term **word" means an oligonucleotide selected from a minimally 
oross-hybridizing set of oligonudeotides, as disclosed in U.S. patent 5,604,097; Int^national 
patent application PCT/US96/09513; and allowed U.S. patoit appUcation Ser. No. 

20 08/659,453; vMcb refermces are incorporated by reference. An oligonucleotide tag of the 
invention consists of a plurality of words, or oligonucleotide subunits, that are selected from 
the same minimally cross-hybridizing set In such a set, a duplex or triplex consisting of a 
word of the set and the conq)lement of any other word of the same set contains at least two 
mismatdies. Preferably, a duplex or triplex consisting ofa word of the set and the 

25 con:q)lattent of any other word of the same set contains an even larger minimum number of 
mismatches, e.g. 3, 4, 5, or 6, depending on the length of the words. Still more preferably, the 
minimum number of mismatdies is either L 2, or 3 less than the length of the word. Most 
preferably, the minimum number of mismatdies is 1 or 2 less than the length of the word. 

"Conqplement" or **tag co^^)leme^t" as used hoein in reference to oligomideotide tags 

30 refers to an oligonudeotide to which a oligonucleotide tag specifically hybridizes to form a 
perfectly matched diq)lex or triplex, hi embodiments y/bsrc specific hybridization results in a 
triplex, the oligonudeotide tag may be selected to be either double stranded or single stranded. 
Thus, v/bsxe triplexes are formed, the term "conqilanent" is meant to enconqjass dther a 
double stranded complonent of a single stranded oligonucleotide tag or a single stranded 

3 5 complement of a double stranded oligonucleotide tag. Usually, populations of identical tag 
complements are attadied to a spatially defined region of a solid phase support. Preferably, 
such solid phase supports are micropartides and the defined region is the entire suiiace of the 
micropardcle. 



-4- 



wo 00/20639 PCT/US99/22585 

The term "oligonucleotide" as used herein includes linear oligomers of natural or 
modified monomers or linkages, including deoxyribonudeosides, ribonucleosides, anomeric 
forms thereof, pq)tide nucleic adds (PNAs), and the like, capable of spedfically binding to a 
target polynucleotide by way of a regular pattern of monomer-to-monomer interactions, sudi 
5 as Watson-Crick type of base pairing, base stacking, Hoogsteen or reverse Hoogsteen types of 
base pairing, or the Uke. Usually monomers arc linked by phosphodiester bonds or analogs 
thereof to form oligonucleotides ranging in size fi-om a few monomeric units, e.g. 3-4, to 
sev^ tens of monomeric units. Whenever an oligonudeotide is rq)resented by a sequence of 
liters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3' order 

1 0 fitmi left to ri^ and that upper or lower case "A" denotes deoxyadenosine, upper or lower 
case "C" denotes deoxycytidine, upper or lower case "G" denotes deoxyguanosine, and upper 
or lower case T" denotes thymidine, unless otherwise noted. Analogs of phosphodiester 
linkages inchide phosphorothioate, phosphorodithioate, phosphoranilidate, phosphoramidate, 
and the like. UsuaDy oligonucleotides of the invention comprise the four natural nudeotides: 

15 however, they may also conq}rise non-natural nucleotide analogs. It is dear to those skilled in 
the art when ohgonucleotides having natural or non-natural nucleotides may be employed, e.g. 
vs^ere processing by enzymes is called for, usually ohgonucleotides consisting of natural 
nucleotides are required. 

"PCTfectly matched** in reference to a dupl^ means that the pofy- or oUgonucleotide 

20 strands making up the duplex form a double stranded structure with one other such that every 
nudeotide in each strand undergoes Watson-Crick basq>airing with a nucleotide in the other 
strand. The term also corr^rdiends the pairing of nucleoside analogs, such as deoxyinosine, 
nucleosides with 2-aminopurine bases, and the Uke, that may be employed. In reference to a 
triplex, the term means that the triplex consists of a perfectly matched duplex and a third 

25 strand in ^ch ever>' nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex between a tag 
and an oligonudeotide means that a pair or triplet of nudeotides in the duplex or triplex fails 
to undergo Watson-Crick and/or Hoogsteen and/or reverse Hoogsteen bonding. 

As used herein, Ae term "complexity'" m reference to a population of polynucleotides 

30 means the number of different species of molecule present in tiie population. 

As used herein, the term "failure sequoice" refers to a synthetic oligonucleotide or 
polynucleotide that does not have the correct, or intended, length and/or sequaice because of a 
failure in a step of the synthetic process, e.g. spurious chain initiation, failure of a coupUng step, 
failure of a coping step, diain scission, or the like. 

35 As used herein, "airq)licaa" means the product of an an^)lification reaction. That is, it 

is a population of polynucleotides, usually double stranded, that are repUcated from a few 
starting sequences. Preferably, amplicons are produced either in a polymerase diain reaction 
(PCR) or by replication in a cloning vector. 
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Detoiled DescriptiOD of the Inventian 
The invention provides an enzymatic method for sxnthesizing a repertoire of 
oligonucleotide tags whose members are substantially free of faihire sequ^ces. 
5 Oligonucleotide tags are combinatorially synthesized by the assembly of error-free words or 
sub-assemblies of words in a series of enzymatic steps. Generally, the mefliod of the 
invoition comprises ttie following steps: (a) providing a repertoire of oligonucleotide tag 
precursors in an amplicon, the oligonucleotide tag precursors eadi comprising one or more 
words, and each of tiie one or more words being selected from the same mimmally cross- 

10 hybridizing set; (b) deaving the anqphcon al a wonl in each of the oUgonudeotide 

precursors to form one or more ligatable ends on each oUgonucleotide tag precursor, (c) 
tigating one or more words to the one or more ligatable aids to dongate each of the 
oligonucleotide tag precursors; (d) amplif^dng the elongated ohgonudeotide tag precursors in 
the amplicon: and (e) repeating stq)s (b) through (d) until a rq)ertoire of oligonudeotide tags 

1 5 having the predetermined length is fbimed. The rq)ertoire of oligonucleotide tags of the 
desired length contained in the final anq)licon may then inserted into a convenient cloning 
vector, as taught by Brenner et al. International patent ^pUcation PCTAJS96/095 13. 
Preferably, each of the oligonucleotide tag precursors has the same length, whidi is determined 
by word length, the number of words making up the initial oligonudeotide tag precursor, and 

20 the stage of the assembly process, i.e. how many words or sub-assemblies of words have been 
added by operation of the method of the mventioo. Preferably, the anqplicon of the method is a 
population of cloning vectors wherein different oligonudeotide tags or ohg^ucleotide tag 
precursors are represented in equal proportions as inserts of such vectors. Preferably, 
whaiever the oligonudeotide tag precursors are cleaved for the ligation of an additional word 

25 or sub-assanbly of words, the deavage takes place at the same word for all the ohgonudeotide 
tag prwursors of the repertoire. Preferably, the step of cleaving is carried out with a type lis 
restriction endonuclease whidi cleaves at the same word for all the oligonudeotide tag 
precursors of the r^rtoire and produces ligatable ends having protruding strands. As used 
herein, the term "ligatable ends" means aids of a double stranded DNA that can be Ugated to 

30 another double stranded DNA, inchidmg blunt-oid ligation and "sticky" end Ugaticni. 
Preferably, ligatable ends are sticky ends. 

The inA^ention further inchides repertoires of oligonudeotide tags defined by the 
following formula: 

35 wi(N)x,A2(N^. .. (N)^,wp 

vrfierdn W] , wj, ... Wn are words selected from the same minimally cross4i>'bridizing set, the 
words having a length of fixMn three to fourteen nucleotides or basepairs; n is an intt^er in the 
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range of from 4 to 10; N is a nucleotide or basepair, and xi, X2, Xn-i are eadi an integer 
indicating how many nucleotides or basepaiis. N, are presort at the given location in the 
sequence of words, xj, X2, ... x^.i each being selected from the group consisting of 0, 1, 2. 3, 
and 4, provided that at least one of xj, ... x^.] is 1, 2, 3, or 4. Preferably, X], X2, ... 
5 are each selected from the group consisting of 0, 1, and 2, provided that at least one of xj, X2, 
... Xjj.} is 1 or 2. Preferably, oUgonucleotide tags of the above formula are synthesized by the 
method of the inventicm. 

Preferably, words are from three to fourteen nucleotides or basepairs in length: and 
more preferably, words are from four to six nucleotides or basepairs in lengdi. Most 
1 0 preferably, words are four nucleotides or basepairs m lengdi. Usually, words consist of a 
linear sequence of nucleotides selected from Ae group consisting of A, C, G, and T. For 
words constructed from 3 of the 4 natural nucleotides, the following word sizes, difreroices 
between words of the same set, and set sizes are preferred: 

Difrerence 

Word Length Between Words Set Size 

4 3 8 

5 4 6 

6 4 9 

7 5 8 

8 5 16 
8 6 9 

15 

In some embodiments employing words of the above characteristics, subsets of the computed 
sets may be employed so that only words having specified GC content melting temperature, 
reduced likeUhood of self annealing, hairpin formation, or the like, are used to form tags. The 
above set sizes were conqputed using the algorithms listed in Brenner et al, PCTAJS96/095 1 3 
20 and allowed U.S. patent ^plication Ser. No. 08/659,453. Exenrq)lary minimally cross- 
hybridizing sets of words for use with the invaition are listed in the following table: 



25 



30 
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Table I 

Exonplarv Sets of Minimally Cross-Hvbridizing Words 
Number of Nucleotides per Word (Minimal No. of Mismatches) 



4(3) 


5(4) 


6(4) 


7(5) 


8(5) 


gatt 


tagta 


gat tag 


gtaaaat 


atgagtat 


tgat 


aaaag 


agagtt 


aaaagga 


aggaagtg 


taga 


agggt 


agttga 


aaggaag 


agggtaga 


tttg 


ggtaa 


gagatt 


aattttt 


agttgaag 


gtaa 


gtatt 


gttggt: 


ggaggtg 


gagatggt 


agta 


tttgg 


tggctg 


gggtaga 


gaggatag 


atgt 




ttagag 


tgtataa 


gagtgata 


aaag 




ttgaga 


ttattgg 


ggaagtga 




atgtat 




ggatagat 



gtaatatg 
gttgggaa 
tatagttg 
tattagga 
tgtgttat 
ttatgagt 
ttgttgag 



5 The length of oligonucleotide tags in a repertoire may vary widely dq)endmg on 

several factors, mcluding the size or complexity of the n^ertoire desired, the difficult}- in 
synthesizmg corresponding tag complements on solid phase supports, the particular 
application, and the like. Generally, longer oligonucleotide tags pennit the generation of larger 
repertoires; however, reliable synthesis of tag complements tiiat exceed 40-50 nucleotides 

1 0 becomes mcreasingly difficult and monitoring and/or exercising quality control of mixtures of 
oUgonucleotides becomes increasingly difficuh as complexity increases. Thus, selection of 
partioilar tag l^igths and conq^lexities requires design tradeoffs by a practitioner of ordinary- 
skill . Preferably, oligonucleotide tags of the invention are in the range of from 1 8 to 60 
nucleotides in length. More preferably, oligonucleotide tags are in the range of from 1 8 to 40 

15 nucleotides in loigth. 

Preferably, mimmally cross-hybridizing sets conq)rise words that make approximateh' 
equivalent contributions to duplex stabiUt}' as every other word in the set. In this way, the 
stabilit>' of perfectly matched duplexes between ever^' word and its complanent is 
approximately equal. Guidance for selecting such sets is provided by published tedmiques for 

20 selecting optimal PCR primers and calculating duplex stabilities, e.g. Rychlik et al. Nucleic 
Acids Research. 17: 8543-8551 (1989) and 18:^6409-6412 (1990); Breslauer et al. Proc. Natl. 
Acad. Sci., 83: 3746-3750 (1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 
(1991);and the Uke. For shorter tags, e g. about 30 nucleotides or less, the algorithm described 
by Rychlik and Wetmur is preferred and for longer tags, e.g. about 30-35 nucleotides or 
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greatCT, an algorithm disclosed by Suggs et al, pages 683-693 in Brown, editor, ICN-UCLA 
Symp. Dev. Biol., Vol. 23 (Academic Press, New Yoiic, 1981) may be conveniently employed. 
Clearly, the are many approaches available to one skilled in the art for designing sets of 
minimally cross-hybridizing words within the scope of the invaition. For example, to 
5 minimize the effects of differmt base-stacking energies of terminal nucleotides whai words are 
assembled, words may be provided that have the same terminal nucleotides. In this way, when 
subunits are linked, the sum of the base-stacking energies of all the adj oining terminal 
nucleotides will be the same, ther^y reducing or eliminating variability in tag meltmg 
temperatures. 

10 For use with tiie invention, words or sub-assonbUes of words are initially synthesized 

as single stranded oligonucleotides using convwitional solid phase synthetic methods, e.g. using 
a commmaal DNA synthesizer, such as PE ^plied Biosystons (Foster City, CA) model 392 
DNA synthesizer, or like instrumait. Preferably, the words or sub-assonblies of words are 
synthesized within a longer oligonucleotide having appropriate restriction aidonudease 

15 recognition sites and primer binding sites to fecilitate later rnariipulation. Preferably, such 
chemically synthesized oligonucleotides are rendered double stranded by providing a primra* 
which binds to one end of the ohgonudeotides and vAncb is extended the length of the 
oligonucleotides with a DNA polymerase in the presmce of the four dNTPs. For example, in a 
preferred embodiment the following oligonucleotide (shown in the 5'->3* orientation) 

20 containing two words may be synthesized chemically (SEQ ID NO: 1): 



PstI BseRI Bbsl Bsp 120 Bbvl Hindlll 

25 cgacacctgcagaggagatgaagacga [word] [word] gggcccatgctgcaagcttaccg 

Formula I 

hi this example, forward and reverse primers shown bdow may be used to render the 
30 ohgonucleotide double stranded so that the indicated restriction endonuclease recQgnitian sites 
are formed. 

5 ' -cgacacctgcagaggag 5 * -FAM-cggtaagcttgcagcat 



35 



Forward primer 
(SEQ ID NO: 2) 



Reverse primer 
(SEQ ID NO: 3) 
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Here the reverse primer is shown with a fluorescent label attached to its 5* end to facilitate 
purification. "FAM" is a fluorescein dye available commercially, e.g. PE Applied Biosystems 
(Foster City, CA). Alternatively, the 64 double stranded oligonucleotides containing the two- 
word combinations may be constructed by separately synthesizing both strands and then 
5 annealing them together for cloning into a conventional cloning vector. 

In embodiments where synthesis errors are eliminated by "self-selection" (described 
more fiilly below), the oligonucleotide of Formula I may be synthesized combinatorially, as 
disclosed in Bremier et al. International patait ^plication PCTAJS96/095 13, so that a mixture 
of oUgonucleotides is produced, the conqx>nents of the mixture being oligonucleotides having 

10 differait words. For example, ifthe four-base words ofTable I are oi^loyed, then the 

mixture corresponding to Formula I would consist of 64 difieroit sequraces, i.e. every possible 
two-word sequence. In embodimoits where synthesis errors are eliminated by confirmatory 
sequencing, the oligonucleotides of Formula I are synthesized separately followed by sepaxstG 
insertion into cloning vectors and sequencing to confirm that each word sequence is correct. 

15 As above, if the four-base words of Table I are enq)loyed, then 64 separate clonings and 

sequence determinations would be required. After such confirmatory sequencing, the 64 clones 
are combined for use in the method of the invention. 

Oligonucleotide tags produced by way of the invention may be assonbled from words 
or sub-assemblies of words either by stq}wise additions in a plurahty of cycles of cleavage and 

20 Ugation of preferably identically sized adaptors, or in stages of convergent assembly of 

fragments, each of such fragments conqprising increasingly larger oligonucleotide precursors. 
Examples of both approaches are illustrated in Figures la (stepwise additions) and lb 
(convergent assembly). In Figure la, vector (100) is prepared for each sequence of words "- 
W1-W2-". The presence of two words in this exanq>le is only for purposes of illustration. In 

25 this embodiment, any number of words can be used. The practical constraint is the 

requirem^ that vector (100) be prq[>ared for every sequence of words. Thus, if three four- 
base words of Table I are enq^loyed, then 512 (=8x64) vectors must be prepared and their 
sequences confirmed. 

Adjacent to words (108) are cleavage sites (107) and (109) of type lis restriction 

30 endonudeases, T2 and r3, recognizing sites (106) and (1 10), respectively. Adjacait to, and 
upstream of, restriction site (106) is restriction site (104) recognized by restriction 
endonuclease, rj . Flanking the entire assembly of restriction sites and words are optional 
primer binding sites (102) and (1 12), which may be used to cq)y the oligonucleotide tag for 
insertion into a vector as taught by Brenner et al. International application pctAis96/095 1 3 . 

35 In the preferred embodiment of Figure la, vector (100) serves (1 14) as a starting 

material for the tag assembly process, i.e. at the start of the process, i=l in the subscript of 
insert (120). Note that the process entails the successive insertion of the following element, or 
cassette: 
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where "w" is a word, "N" is a nucleotide, and k is an integer equal to 1 , 2, 3, or 4. The term 
5 "(N)i^" is equivalent to elemait (109) of Figure la. As described above, preferably k is equal 
to 1 or 2, which is the laigth of the protruding strand resulting from deavage with the 
preferred type lis restriction endonudeases of the invmion. r3 is virtually any type lis 
restricdan endonudease vMA allows a predictable sequence (109) to be engineered into 
vector (100). Exemplary r3*s mclude Alw U Bbs I, Bbv L Bci VI, Bpm I, Bsa MI, Bse Gl, Bsr 

10 DI, Ear I, Fau 1, Mho n, and the like. Preferably, leaves a 1 or 2 nucleotide protruding 
strand after deavage. Likewise, T2 is virtually any type lis restriction endonudease wfaidi 
allows a predictable sequence (107) to be oigineered into vector ( 100). T2 may be selected 
from the same group of type lis restriction endonudeases as r3, but preferably for a givea 
vector rj and T2 are different 

15 Cydes of word addition in the preferred embodimrat, illustrated in Figure la, hogjn 

with the step of cleaving (122) vector (121) with rj and xj, to remove segment (123), thereby 
leaving opened vector (124), vMch is then isolated using conventional protocols. In this 
embodiment, T2 cleaves the oligonudeotide tag precursor at the upstream-most word of tiie 
tag. Sq)aratdy, restriction endonudeases rj and r3 recognizing restriction sites (104) and 

20 (11 0), respectively, are used to cleave (116) vector (1 00) to produce fragment (118), which is 
inserted (126) into opened vector (124) to form vector (128), thereby dongating the 
oligonucleotide tag precursors by two words. The cycles are repeated (130) until an 
oligonudeotide tag repertoire of the desired length is obtained. At sudi point, the 
oligonucleotide tags may be excised from vector (128) by digesting with T2 and r3. 

25 Alternatively, rqiertoires may be synthesized in accordance with the mvention with a 

convergent strategy as illustrated in Figure lb. Vector (150), which may be identical to vector 
(100), contains the following elements: restriction site (152) for restriction endonudease, rj, 
restriction site (154) for restriction aidonuclease X2, vtoch has deavage site (155), one or 
more words (156), and restriction site (158), whidi has cleavage site (157). Optionally, vector 

30 (150) may also contain flaTiking primer binding sites as witii vector (100) (not shown) for 

producing copies of the ohgonucleotide tags or their precursors. Two aliquots (160) and (162) 
are taken of vector (150). In aliquot (160),. vector (150) is digested with rj and T2 so that 
fragment (161) Ls excised and opened vector (166) is formed. Separately, in aliquot (162). 
vector (150) is digested with r| and r3 so that 2-word fragment (164) is exdsed. After 

3 5 purification, 2-word fragment ( 1 64) is inserted and ligated ( 1 68) into opened vector ( 1 66) to 
form vector (170), which contains ohgonucleotide tag precursors consisting of four words 
each. Thesestepsarerepeatedusing vector (170) as the starting material. That is, two 
ahquots (174) and (176) are takoi of vector (170). In aliquot (174), vector (170) is digested 
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with rj and T2 so that fragmCTt (175) is excised and opaied vector (1 80) is formed. 
Separately, in aliquot (176), vector (170) is digested with r] and r3 so that 4-woid firagmait 
( 1 78) is excised. After purificatioiL 4-word ftagmwit ( 1 78) is ligated ( 1 82) into opened vector 
(184) to form vector (184), which contains oligonucleotide tag precursors consisting of eight 
5 words each. AdditicMial cycles may be carried out, or if the desired length of the tags is 8 
words, then the oligonucleotide tags may be excised (186) by digesting with T2 and r3. 

Repertoires of oligonucleotide tags may also be produced in accordance with the 
invention by i^>eated additions of words with self-selection during the ligation step. In this 
wnbodiment, the length of the protruding strand produced by cleavage with a type lis 
10 restriction endonuclease is the same as the length of a word. When an oligonucleotide tag 
precursor is cleaved at a word, cleavage occurs precisely at the upstream and downstream 
boundaries of a word, i.e. across a word, as shown below: 



cleavage site 
15 . 



20 



5*-... rmnn-xxxx-xxxx-xxxx-nnimn 
3 ' - . . . nimn-xxxx-xxxx-xxxx-nimnn 

t 

cleavage site 



25 



5 • - . . . nzmn xxxx-xxxx-xxxx-nnnnn 
3 • - . . . mmn-xxxx xxxx-xxxx-nnnnn 



where the segments "-xxxx-" rq)resent words consisting of four nucleotides each. Preferably, 
in this embodimait, word lengths of either 3, 4, or 5 nucleotides are employed. A preferred 
implementation of this embodiment is illustrated in Figure 2. Vector (200), produced from 
conventional starting materials, includes the following elements: restriction site for (204), 
30 restriction site for r5 (206), restriction site for r^ (208), cleavage site (209), a phirality of 
woids (210), restriction site for tj (212), and a restriction site for rg (214). As with vector 
(100), the above series of dements may be flanked by optional primer binding sites (202) and 
(216) so that the oligonucleotide tag precursors may be conveniently replicated, e.g. by PCR 
amplification. 

35 Vector (22 1), vsiiich may be a sample of starting vector (200) or a previously 

pnx:essed vector, is cleaved (224) with and r^ to produce fragmart (225) and opraied vector 
(228), which is isolated using conventional protocols, rg is a type lis restriction endonuclease 
\\incb cleave across the upstream-most word of the oUgonudeotide tag precursor of vector 
(228). Vector (228) is actuaUy a mixture by virtue of the different oligonucleotide tag 
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precursors. In pardoilar, the protniding strand of oid (226) is presait in N 
sequences, wdiere N is the number of words in the minimally cross-hybridizing set bemg used. 
Separately, a sansplc of vector (200) is cleaved (222) with r4 and rg to produce fragment 
(218), vAicb is isolated. Fragment (21 8) is a mixture containing conq)onents in this 
5 example, where again N is the number of words in the minimally cross-hybridizing set being 
used- N is to the second power because the fragment contains all possible combinations of two 
consecutive words. UemexA (220) of fragment (218) is the single-stranded form of the second, 
or downstream-most, word of vector (200). Fragment (218) is combined with opened vector 
(228) under conditians that permit the single straiKied forms of the words (220) and (226) to 

10 form perfectly matched duplexes. Becauseoftheminimally cross-hybridization property of 
the protruding strands, these conditions are readily met Strands that are not conq)lementaiy 
or that ccxitain failure sequences will not form perfectly matdbed duplexes and will not be 
ligated. In this sense, the words in the protruding strands are "self-selecting." After ins^tion 
and Kgation (230), vector (232) is fijrmed which contains and dongated oligonucleotide tag 

1 5 precursor. The cleavage and insertion steps are repeated (234) until an oUgonucleotide tag of 
the desired length is obtained, after which the ohgonucleotide tag repertoire may be excised by 
cleaving with rj and r.5. 

The following exan^les save to ilhistrate the present invaition and are not meant to 
be limiting. Selection of many of fte reagents, e.g. enzymes, vectors, and other materials; 

20 selection of reactitm conditions and protocols; and material specifications, e.g. word laigth and 
composition, tag length, repertoire complexity, and the like, are matters of design choice vAdcAi 
may be made by one of ordinary skill in the art. Ext^sive guidance is available in the 
literature for applying particular protocols for a wide variety of design choices made in 
accordance with the invention, e.g. Sambrook et al. Molecular Cloning, Second Edition (Cold 

25 Spring Harbor Laboratory, New York, 1989); Ausubel ^ al, editors, Curroit Protocols in 
Molecular Biolc®^ (John Wiley & Sons, New York, 1997); and the Kke. 

Example 1 

Repertoire Svnthesis bv Repeated Cycles of Cleavage, 
30 Self-Selection. Ligation, and Anrnhficalian 

In this example, an oligonucleotide tag repertoire is produced sudi that eadi 
oligonucleotide tag consists of eight words of four nucleotides. The procedure outlined in 
Figure 2 is followed. A vector, corresponding to vector (200), is constructed by first inserting 
the following oUgonucleotide (SEQ ID NO: 4) into a Bam HI and Eco RI digested pUC19: 

35 
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PacI BseRI Bsp 120 Bbsl EcoRI Bam HI 

i i i i i i 

aattq ttaattaag gatgagctcactcctc xTqqcccg cataagtcttcgaattcg 

caattaattcctactcgagtgaagagcccgggcgtattcagaagcttaagcctag 



Fonnulan 



Separately, the oligonucleotide of Fonnula 1 and forward and reverse primers (SEQ ID NO: 2 
and SEQ ID NO: 3) are synthesized using a conviaitianal DNA synthesizer, e.g. PE Applied 
1 0 Biosystems (Foster City, CA) model 392. The obgonucleotide of Formula I is a mixture 
containing a repertoire of 64 two-word oligonucleotide tag precursors. The four-nudeotide 
words of Table 1 are anployed. A&sr ampUfication by PCR, the amplification product is 
digested with Bbs I to give the following two products: 

15 ... gaagacga word-word- gg ... 

. . . cttctgct-word word-cc . . . 



20 



The products are re-Hgated, amplified by PGR, and digested with Bbv I to give the fbUowing 
two products: 

. . . gaagacga -word word-gg . . . 

. . . cttctgct-word- word cc ... 



The products are again re-ligated and amphfied by PGR. By this sequence of cleavages and 
25 relations, any words consisting of feihire sequences are selected against by the ligation event, 
i.e. words with feihire sequences will not religate in the mixture, and thus, will ntrt be 
amplified. The final product is digested widi Pst I and Hind III and inserted into a Pst I/Hind 
ni-<ligested pUG19 to give the following omstruct (SEQ ID NO: 5): 

30 Pst I BseRI Bbsl Bsp 120 Hindffl 

. cgacctgcagaggagatgaagacga-wordword-gggcccaatgctgcaagcttggcg . 
. gctggacgtctcctctacttctgct-wordword-cccgggttacgacgttcgaaccgc . 

t 



35 



Bbv I 



where Pst L Bse RI, Bbs I, Bsp 120, and Bbv I, correspond to t^, r^, r-j, and rg of Figure 2, 
respectively. After anq)lification in a suitable host, the plasmid is isolated and cleaved with 
Pst 1 and Bbs I to give an opened vector with the following upstream and downstream (SEQ 
40 ID NO: 6) aids: 
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.cgacctgca wordword-gggcccaatgctgcaagcttggcg. . . 

.gctgg word-cccgggttacgacgttcgaaccgc. . . 

5 Separately, a portion of the anq)lified oligonucleotide of Formula I is digested with Pst 1 and 
Bbv I to give the following firagment (SEQ ID NO: 7): 

gaggaga tgaagacga - word 
acgtctcctctacttctgct-wordword 

10 

This ftagment is inserted into the above vector opaied by digestion with Bbs I and Pst I to give 
the following construct (SEQ ID NO: 8): 

.gcagaggagatgaagacga-wordwordword-gggcccaatgctgcaagcttggcg. . . 
15 . . .cgtctcctctacttctgct-wordwordword-cccgggttacgacgttcgaaccgc. . . 

whidi contains an oligonucleotide tag precursor of three words. The steps of cleaving, 
inserting, and amplification are repeated until a construct containing eight words is obtained. 
Preferably, at each stq), reactants, e.g. vectors and/or inserts, are provided in amounts that are 
20 at least ten times the conq)lexity of the reactant. Whm synthesis is complete, the eigjit-word 
construct is cleaved with Bse RI and Bsp 120 and the following firagment containing the 
oligonucleotide tag rq)ertoire is isolated: 

(word) gg 

25 ct (word) gcccgg 

The isolated fragment is thm inserted into the Bse RI/Bsp 120 vector of Formula II, vMch 
vector is used to transform a suitable host. The construct is ready for inserting 
polynucleotides, such as cDNAs, into the Eco RI restriction site to fonn tag-polynucleotide 
30 conjugates in accordance with the method of Brenner et al. International patent application 
pct/us96/09513. 

Example 2 

Repertoire Svntfaesis bv Converpait Assembly of 
35 Error-fi^ OH^nucleotide Ta^ Precursors 

hi this example, an oligonucleotide tag rep^toire is produced following the procedure 
outlined in Figure lb. Each oligonucleotide tag consists of eight words of six nucleotides each 
(selected fi-om those listed in Table I) to give the repertoire having an expected complexity of 
9^, or about 4.3 x lO'^. For eadi of the 9x9=8 1 two-word combinations, an oligonucleotide 
40 (SEQ ID NO: 9) of the following form is synthesized: 
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Pstl BseRI Bsp 120 

cqacac ctgcagt tatcg qaqqaqa tgaagacgg [word] [word] qqgccca tat- 



-atccqt ctqcacaaqctt a ccg 
t t 
Bsgl Hmdm 

10 

Formula ni 



The oligonucleotides of Formula III are rendered double stranded and anq}lified by providing 
forward and reverse primers and conducting a PCR, as described above for the oligonucleotide 

15 of Formula I. Aft^ amphficatian, the oligonucleotides are sqsarately cleaved with Pst I and 
Hind in and cloned into a similarly cleaved M13mpl8 and suitable hosts are transformed. 
Clones are selected and the oligonucleotide inserts are sequenced using conventional 
tedmiques. Such selection and sequaidng continue untfl a vector is obtained for each of the 
8 1 two-word combinations v/bos& sequence is confirmed to be correct. Aliquots of die vectors 

20 are then combined in equal proportions to form an 8 l-component mixture, after yAuch the 
vectors are cleaved with Pst I and Hind ID and the word-containing firagment is isolated and 
cloned into a similarly cleaved pUC19 to give a construct of the following form (SEQ ID NO: 
10): 

25 ... ctgcagttatcggaggagatgaagacgg[word] [word] gggccca tat - 
. . . gacgtcaatagcctcctctacttctgcc[word] [word] cccgggtata- 



-atccgtctgcacaagcttggcg . . . 
30 -taggcagacgtgttcgaaccgc . . . 

After cloning, the population of vectors is divided into two parts, after whidi the 
vectors in one part arc cleaved with Pst I and Bsg I to give the following fragment mixture 
(SEQ ID NO: 11): 



35 



gttatcggaggagatgaagacgg [word] [word] gg 
acgtcaatagcctcctctacttctgcc [word] [word] 
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vrfiich is isolated. The vectors in the other part are cleaved with Pst I and Bse Rl and the 
linearized word-containing vectors are isolated. The word-containing fragments are hgated 
into the linearized vectors to form the following construct (SEQ ID NO: 12): 

5 ... ctgcagttatcggaggagatgaagacgg [word] [word] gg [word] [word] - 
gacgtcaatagcctcctctacttctgcc [word] [word] cc [word] [word] - 



-gggcccatatatccgtctgcacaagcttggcg . . . 
10 -cccgggtatataggcagacgtgttcgaaccgc . . . 

After cloning, the construct is again divided into two parts and the steps are rq>eated to give 
the final 8-word repertoire having the form: 

15 . . gaagacgg{ [word] [word] gg) ^gccc . . . 

. . cttctgcc ( [word] [word] cc) 4cggg . . . 

This may then be cleaved with Bse RI and Bsg 1 and re-cloned into a vector similar to that of 
Formula II for attadmient to polynucleotides. 

20 

Example 3 

Constructian of an Eiefat-Word Tag Library 
In this exanq)le, an eight-word tag library with four-nucleotide words was constructed 
fi-om two two-word libraries in vectors pLCV-2 and pUCSE-2. Prior to construction of the 

25 eight-word tag Ubrary, 64 two-word double stranded oligonucleotides were separately inserted 
into pUC19 vectors and propagated. These 64 oUgonucleotides consisted of every possible 
two-word pair made up of four-nucleotide word selected from an ei^t-word minim a ll y cross- 
hybridizing set described in Bremer, U.S. patait 5,604,097. After the idoitities of the inserts 
were confirmed by sequencing, the inserts were then amplified by PCR and equal amounts of 

30 each ampUcon were combined to form the inserts of the two-word hbraries in vectors, pLCV-2 
and pUCSE-2. These were then used as described below to form an eigjrt-word tag Ubrary in 
pUCSE, after which the eight-word insert was transferred to vector pNCY3 vMch contains 
additional primer binding sites and restriction sites to facilitate tagging and sorting 
polynucleotide fragm^ts. 

35 

A. Construction of two-word sequences in pUCSE. 

pUC19 was digested to conq)letion with Sap 1 and Eco RI using the manufactuer's 
protocol and the large fragment was isolated. All restriction endonucleases imless otherwise 
noted were purchased fixan New England Biolabs (Beverly, MA). The small Sap I-Eco RI 
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fiagment was removed to eliminate the p-gal promoter sequence, which was found to skew the 
i^resentation of some combinations of words in the final hl)rary . The following ad^Jtor 
(SEQ ID NO: 13) was hgated to the isolated large firagment in a convaitional Ugation reaction 
to give plasmid pUCSE as a ligation product. 



Eco RI Pst I Eco RV Hind III 

i ^ 4. 4 

aattctagactgcagttgatatcttaagctt 
10 gatctgacgtcaactatagaattcgaacga 



A bacterial host was transfoimed by the Ugation product using electroporation, after which the 
transformed bactoia were plated, a clone was selected, and the insert of its plasmid was 
sequenced for confixmatian. pUCSE isolated from the clone was thai digested with Eco RI 
15 and Hind in using the manufecturer's pnrtocol and the laige firagment \vas isola^ The 

following ad£q)tor (SEQ ID NO: 14) was ligated to the large firagment to give plasmid pUCSE- 
D 1 which contained the first di-woid (underlined). 



20 

BseRI 

EcoRI PstI Bhsl Bspl20I Hindlll 

i i i i i i 

aattctgcagaggagatgaagacgaaaagaaaggggcccatgctgca 
25 gacgtctcctctacttctgcttttctttccccgggtacgacgttcga 

t 

Bbvl 



Fonnula I 

30 

Further plasmids, pUCSE-D2 through pUCSE-D64, containing di-words were separately 
constructed firom pUCSE-Dl by digesting it with Pst I and Bspl20 1 and separately hgating 
the following adaptors (SEQ ID NO: 15) to the large firagment. 

35 gaggagatgaagacga [word] [wordlg 

acgtctcctctacttctgct [word] [word] cccgg 

Fonnula n 
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The words of the top strand were selected from the fbUowing minimally cross-hybridizing set: 
ga tr, tgat, taga^ tttg, gtaa, agta, atgt, and aaag. After cloning and isolation, the inserts of the 
vectors were sequoiced to canfirm the identities of the di-words. 

5 B. Construction pLCV. 

Plasmid cloning vector pLCV-Dl was created from plasmid vector pBC.SK' 
(Stratagofie) as follows, iising the following oligonucleotides: 

S-723 (SEP ID NO: 16) 

10 5'-CGA GAA AGA GGG ATA AGG CTC GAG CTT AAT TAA GAG TCG ACG AAT 
TCG GGC CCG GAT OCT GAG TCT TTC TCC CT-3' 

S-724 fSE0IDN0:17) 

5'-CTA GAG GGA GAA AGA GTC AGG ATC CGG GCC CGA ATT CGT CGA CTC 
15 TTA ATT AAG CTC GAG CCT TAT CCC TCT TTC TCG GTA C-3' 

S-785 (SEO ID NO: 18) 

5' -TCG AGG CAT AAG TCT TCG AAT TCC ATC ACA CTG GGA AGA CAA CGT 
AG-3' 

20 

S-786 fSE0IDN0:19) 

5 ' -GAT CCT ACG TTG TCT TCC CAG TGT GAT GGA ATT CGA AGA CTT ATG 
CC-3' 

25 S-960 (SEOIDNO:20) 

5' -TCG ATT AAT TAA CAA GCT TTG GGC CCT CGA GCA TAA GTC TTC TGC 
AGA ATT CGG ATC CAT CGA TGG TCA TAG C-3' 

S-961 (SE0roN0:21) 

30 5' -TGT TTC CTG CCA CAC AAC ATA CGA GCC GGA AGC GGC CGC TCT 
AGA-3' 

S-962 (SE01DN0:22) 

5 ' -AGC GTC TAG AGC GGC CGC TTC CGG CTC GTA TGT TGT GTG GCA GGA 
35 AAC AGC TAT GAC CAT C-3' 

S-963 (SE0IDN0:23) 
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5' -GAT GGA TCC GAA TTC TGC AGA AGA CTT ATG CTC GAG GGC CCA AAG 
CTT GTT AAT TAA-3' 

S>11Q5 (SEOroNO:24) 
5 5'-TCGA GGG CCC GCA TAA GTC TTC-3' 

S-1106 fSEOIDNO:25) 

5' -TOGA GAA GAC TTA TGC GGG CCC-3' 

1 0 Oligonucleotides S-723 and S-724 were kinased, annealed together, and ligated to 

pBC.SK' which had been digested with KprJ and Xbal and treated with calf intestinal alkaline 
phosphatase, to create plasmid pSW143. 1 . 

Oligonucleotidess S-785 and S-786 were kinased^ annealed togedier, and ligaied to 
plasmid pSW]43.1, whidi had been digested with Xhol and BamHI and treated widi calf 

1 5 inestinal alkaline phosphatase, to create plasmid pSWl 64.02. 

OUgonucleoddes S-960, S-961, S-962, and S-963 were kinased and annealed together to 
form a duplex consistmg of the four oligonucleotides. Plasmid pSW164.02 was digested with 
XholandSapl. The digested DNA was electrophoresed in an agarose geL and the 
approximatdy 3045 bp product was purified from the approprate gel slice. Plasmid pUC4K 

20 (fhnn Pharmada) v^'as digested with PstI and electrophoresed in an agarose gd. The^prox. 
1240 bp product was purified from the appropriate gel slice. The two plasinid products (from 
pSW 164.02 and pUC4K) were ligated tog^er with the S-960/96 1/962/963 duplex to create 
plasmid pLCVa. 

DNA from AdenovirusS (New England Biolabs) was digested with PacI and Bspl201, 
25 treated widi calf imestinal alkaline phosphatase, and electrophoresed in an agarose gel. The 

approx. 2853 bp product was purified from the ^propriate gel slice. This fragment was Ugated 
to plasmid pLCVa v^cfa had been digested with PacI and Bsp 1201, to create plasmid 
pSW208.14. 

Plasmid pSW208. 14 was digested with Xhol, treated with calf intestinal alkaline 
30 phosphatase, and electrophoresed in an agarose gel. Hie approx. 5374 bp product was purified 
from the appropriate gel slice. This fragm^t was ligated to oligonucleotides S-1 105 and S-1 106 
{vAndb had been kinased and annealed together) to produce plasmid pLCVb, v^ch was then 
digested with Eco RI and Hind m. The large fragment was isolated and ligated to the Formula 1 
adaptor (SEQ ID NO: 14) to give pLCV-Dl . 
35 As above for pUCSE, fiirther plasmids, pLCV-D2 through pLC V-D64, containing di- 

words were separately constructed finom pLCV-Dl by digesting it with Pst I and Bsp 120 L 
isolating the large fi^gmoit, and a ligating an adaptor of Formula II. After cloning and 
isolation, the inserts of the vectors were sequenced to confirm thhe identities of the di-words 
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C. CoDStraction oftwo-word libraries, pUCSE-2 and pLCV-2. 

Each of the vectors pLCV-Dl through -D64 and pUCSE-Dl through -D64 was 
separately amplified by PCR. The components of the reaction mixture were as follows: 

5 

10 pi template (about 1-5 ng) 

10 pi lOx Klaitaq™ buffer (Clontech Laboratories, Palo Alto. CA) 

2.5 pi bictinylated DF primer at 1 00 pmoles/pl 

2.5 ul biotinylated DR primer at 100 pmoles/pl 

10 2.5 pi 10 mM deoT^nucleoside trq}hosphates 

5 p] DMSO 
66.5 pi H2O 

I pi Advantage Klwitaq™ (Clontech Laboratories, Palo Ako, CA) 



15 The temperature of the reactions was controlled as follows: 94^C for 3 ndn; 25 cycles of 
94«>C for 30 sec, 60^ for 30 sec, and ll^C for 10 sec; followed by 72^0 for 3 min, then 
4^0. The DF and DR primer binding sites were upstream and downstream portions of the 
vectorsselected to give airq)licons of 104 basq)air5 in length. After the reactions w^ 
completed, 5 pi of eadi PCR product were sq)arated polyacrylamide gel electrophoresis (20% 

20 with IxTBE) to confirm by visual inspection that the reaction yields were approximately the 
same for each PCR. After sudi confirmation, usmg conventional protocols, 10 pi of each 
PCR was extracted twice witii phenol and once with chloroform, after which the DN A in the 
aqueous phase was precipitate with ethanol. After resuspension in 200 pi of Ix NEB buffer 
#2 (New England Biolabs, Beverly, MA), the DNA was cleaved with Bbv I and Eco RI by 

25 adding the aizymes in 50 pi of the manu&cturer's recommended buffer. The digestion resulted 
m the production of three fiagments: a biotinylated firagmcnt of 38 basepairs, a di-word- 
containing fragmoit of 29 basqjairs, and a biotmylated fragmi^ of 37 basepairs. After 
completion of the reaction, the excess biotinylated primers were remo\^ by adding 50 pi 50% 
UhraUnk (streptavidin-Sq)harose, Pierce Chemical Co., Rockford, JL) and vorte.Ning the 

30 mixture at room temperature for 30 min. The Uttralink material was separated fixim the 
reaction mixture by centrifiigation, after n^ch approximately half of i3s& mixture was 
separated by polyacrylamide gel electrophoresis (20% gel). The 29-basepair band was cut out 
of the gel and the 29-basepair fi^gmoit was ehited using the "crush and soak" method, e.g. 
Sambrook et aL Molecular Cloning. Second Edition (Cold Spring Harbor Laboratory, New 

35 York, 1 989). This material was thai hgated into either pLCV-Dl or pUCSE-Dl after the 

latter were digested with Bbs I and Eco Rl and treated with calf intestine alkaline phosphatase, 
using manufacturer's recommend protocols. 



D. Construction of pNCV3. 
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pNCV3 was constructed by first assembling the following fragment (SEQ ID NO: 26) 
firom synthetic oligonucleotides: 



5 EcoRI 
i 

aattctgtaaaacgacggccagtcgccagggtttccccagtcacgacgtgaataaatag- 
gacattttgctgccggtcagcggtcccaaaagggtcagtgctgcacttatttatc- 

10 

Pad Bspl20l 

ttaattaaggaataggcctctcctcgagctcggtaccgggcccgcataagtcttc- 
15 aattaattccttatccggagaggagctcgagccatggcccgggcgtattcagaag- 



20 Clal EcoRV Sapl BamHI 

atctatcgatgattgaagagcgatatcgctcttcaatcggatccatcc- 
tagatagctactaacttctcgctatagcgagaagttagcctaggtagg- 

t 

25 Sapl 

Hindi II 
i 

tcaactaattaccacacaacatacgagccggaagcgggtcatagctgtttcctga 
30 agttgattaatggtgtgttgtatgctcggccttcgcccagtatcgacaaaggacttcga 



AftCT isolation, the fragment was clraed into Eco RI and Hind Ill-digestcd pLCV-Dl using 
conventicmal protocols. 



35 



E. Assembly of eight-word library. 

The di-words of pLCV-2 wore ampUfied eidier by PCR or plasmid e?q)ansion, die 
product was digested with Eco RI and Bbvl after which the Eco RI-BbvI firagment was 
isolated as insert 1. Two-word hl>rary pUCSE-2 was digested with Eco RI, Bbs I, and Pst I 

40 after which the large fi-agment was treated with calf intestine alkaline phosphatase to give 
vector 1 . Vector 1 and insert 1 were combined in a conventic »al ligation reaction to give 
three-word library, pUCSE-3. pUCSE-3 was digested wifli Eco RI, Bbs 1, and Pst I, after 
which the large fragmmt was treated with calf intestine alkaline phosphatase to give vector 2. 
Vector 2 and insert 1 were then combined in a conventional ligation reaction to give four-word 

45 library, pUCSE-4. The 4-mer words of pUCSE-4 were ampUfied either by PCR or plasmid 
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expansion, the product was digested with Eco RI and Bbvl after \^4uch the Eco RI-BbvI 
fragment was isolated as insert 1, pLCV-2 was digested with Eco RI, Bbs I, and Pst I, after 
vAidi the large fragmrait was treated with calf intestine alkahne phosphatase to give vector 3. 
Vector 3 and insert 2 were thai combined in a conventional ligation reaction to give five-word 
5 library, pLCV-5. The Snner words of pLCV-5 were anq)lified either by PCR or plasmid 
expansion, the product was digested with Eco RI and Bbvl after which the Eco RI-BbvI 
fragment was isolated as insert 3. pUCSE-4 was digested with Eco RI, Bbs 1, and Pst I, after 
^siiich the large fragment was treated with calf intestine alkaline phosphatase to give vector 4 . 
Vector 4 and insert 3 were then combined in a convaitional Ugation reaction to give eight-word 

1 0 horary, pUCSE-8. The Snner words of pUCSE-8 were an^)lified either by PCR or plasmid 
expansion, the product was digested with Bsc RI and Bspl20 1, after which the BseRI- 
Bspl20I fragment was isolated as insert 4. pNCV3 was digested with Bse RI, Bspl20 1, and 
Sac I, after which the large fr-agmoxt was isolated and treated with calf intestine alkaline 
phosphatase to give vector 5. VeOor 5 was then combined with insert 4 in a conventional 

1 5 Ugation reaction to give the eight-word library pNCV3-8. 

F. Confirmation Sequencing of a Random Selection of Eight-Word Tags. 

The results of the word assembly were tested by sequencing the 8-word inserts of 176 
vectors from the pNCV3-8 horary. The results of the sequence detCTminations are 
20 summarized in the following table: 



Number of Tags Result pCTCOitage 



147 Perfect 8 words 83 .5% 

1 1 Perfert 7 words 6.2% 

8 No insert 4.5% 

4 8 words with 1 base deletion 2.2% 

3 8 words with an incorrect word 1.7% 

I 12 words 0.5% 

1 10 words 0.5% 

1 9 words 0.5% 
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<110> Brenner, Sydney 

Williams, Steven R. 
<120> Enzymatic synthesis of oligonucleotide tags 
<130> 810-01 
<140> 
<141> 

<150> US 60/103,030 
<151> 1998-10-05 
<160> 26 

<170> Microsoft Word 5.1 

<210> 1 
<211> 58 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> 

<222> 

<223> 

<400> 1 

cgacacctgc agaggagatg aagacgaddd dddddgggcc catgctgcaa 50 
gcttaccg 58 

<210> 2 
<211> 17 
<212> DNA 

<2l3> Artificial Sequence 

<220> No special biological significance. 

<221> Primer. 

<222> n.a. 

<223> 

<400> 2 

cgacacctgc agaggag 17 

<210> 3 
<211> 17 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Primer. 

<222> n.a. 

<223> 

<400> 3 

cggtaagctt gcagcat 17 

<210> 4 
<211> 55 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> n.a. 

<223> 

<400> 4 

aattgttaat taaggatgag ctcactcctc gggcccgcat aagtcttcga 50 
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<210> 5 
<211> 57 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance, 

<22l> Cloning vector. 

<222> n.a. 

<223> 

<400> 5 

cgacctgcag aggagatgaa gacgaddddd dddgggccca atgctgcaag 50 
cttggcg 57 



<210> 6 
<211> 32 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Vector. 

<222> 

<223> 

<400> 6 

ddddddddgg gcccaatgct gcaagcttgg eg 32 

<210> 7 
<211> 20 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> n.a, 

<223> Preferably, contains fluorescent label. 
<400> 7 

gaggagatga agacgadddd 20 



<210> 8 
<211> 55 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Vector. 

<222> n.a. 

<223> 

<400> 8 

gcagaggaga tgaagacgad dddddddddd dgggcccaat gctgcaagct 50 
tggcg 55 

<210> 9 
<211> 78 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Tag repertoire. 

<222> n.a. 

<223> n.a, 

<400> 9 
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cgacacctgc agttatcgga ggagatgaag acggdddddd ddddddgggc 50 
ccatatatcc gtctgcacaa gcttaccg 78 

<210> 10 
<211> 72 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Vector. 

<222> N.a. 

<223> N.a- 

<400> 10 

ctgcagttat cggaggagat gaagacggdd dddddddddd gggcccatat 50 
atccgtctgc acaagcttac eg 72 

<210> 11 
<211> 36 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> N.a. 

<223> N.a. 

<400> 11 

gttatcggag gagatgaagac ggdddddddd ddddgg 36 

<210> 12 
<211> 86 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Vector. 

<222> N.a. 

<223> N.a. 

<400> 12 

ctgcagttat cggaggagat gaagacggdd dddddddddd ggdddddddd 50 
ddddgggccc atatatccgt ctgcacaagc ttaccg 86 

<210> 13 
<211> 31 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> N.a. 

<223> N.a. 

<400> 13 

aattctagac tgcagttgat atcttaagct t 31 



<210> 14 
<211> 47 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> N.a. 
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<223> N.a. 
<400> 14 

aattctgcag aggagatgaa gacgaaaaga aaggggccca tgctgca 47 



<210> 15 
<211> 25 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> N.a. 

<223> N.a. 

<400> 15 

gaggagatga agacgadddd ddddg 25 



<210> 16 

<211> 74 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 16 

cgagaaagag ggataaggct cgagcttaat taagagtcga cgaattcggg 50 
cccggatcct gactctttct ccct 74 



<210> 17 
<2H> 82 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 17 

ctagagggag aaagagtcag gatccgggcc cgaattcgtc gactcttaat 50 
taagctcgag ccttatccct ctttctcggt ac 82 



<210> 18 
<211> 47 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance, 

<221> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 18 

tcgaggcata agtcttcgaa ttccatcaca ctgggaagac aacgtag 47 



<210> 19 
<2ll> 47 
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<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Vector. 

<222> N.a. 

<223> N.a. 

<400> 19 

gatcctacgt tgtcttccca gtgtgatgga attcgaagac ttatgcc 47 



<210> 20 
<211> 72 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 20 

tcgattaatt aacaagcttt gggccctcga gcataagtct tctgcagaat 50 
tcggatccat cgatggtcat ag 72 



<210> 21 
<211> 45 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 21 

tgtttcctgc cacacaacat acgagccgga agcggccgct ctaga 45 



<210> 22 
<211> 62 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 22 

agcgtctaga gcggccgctt ccggctcgta tgttgtgtgg caggaaacaa 50 
gctatgacca tc 62 



<210> 23 
<211> 57 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 23 
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gatggatccg aattctgcag aagacttatg ctcgagggcc caaagcttgt 50 
taattaa 57 



<210> 24 
<211> 22 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<22l> Oligonucleotide. 

<222> N.a. 

<223> N.a. 

<400> 24 

tcgagggccc gcataagtct tc 22 



<210> 25 
<211> 22 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Vector. 

<222> N.a. 

<223> N.a. 

<400> 25 

tcgagaagac ttatgcgggc cc 22 



<210> 26 
<211> 217 
<212> DNA 

<213> Artificial Sequence 

<220> No special biological significance. 

<221> Adaptor. 

<222> N.a. 

<223> N.a. 

<400> 26 

aattctgtaa aacgacggcc agtcgccagg gttttcccag tcacgacgtg 50 
aataaatagt taattaagga ataggcctct cctcgagctc ggtaccgggc 100 
ccgcataagt cttcatctat cgatgattga agagcgatat cgctcttcaa 150 
tcggatccat cctcaactaa ttaccacaca acatacgagc cggaagcggg 200 
tcatagctgt ttcctga 217 
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We claim: 

1 . A method of synthesizing a repertoire of oligonucleotide tags of a predetermined 
length, the method comprising the steps of: 

5 (a) providing a repertoire of oligonucleotide tag precursors in an amphcon, the 

oUgonucleotide tag precursors each conqsrising one or more words, and each of Ae one or 
more words being selected from the same minimally cross4iybridizing set; 

(b) cleaving the anq^hcon at a word in each of the oligonucleotide tag precursors to 
form one or more ligatable ends on each oligonucleotide tag precursor; 
10 (c) ligating one or more words to &e one or more Ugatable ends to elongate each of the 

ohgonucleotide tag precursors; 

(d) anq}liiying the elongated oligonucleotide tag precursors in the amplicon; and 

(e) repeating stq>s (b) throu^ (d) until a rqjertoire of oligonucleotide tags having the 
predetermined length is formed. 

15 

2. The method of claim 1 \dierein said anq^licon is a cloning vector. 

3 . The method of claim 2 wherein said step of cleaving includes cleaving said anq)licon 
in a region adjacent to said word by a type lis restriction endonuclease. 

20 

4. The method of claim 3 \s4ierein said word has a length in the range of from three to 
fourteen nucleotides. 

5. The m^hod of claim 4 herein oUgonucleotide tag has a length in the range of from 
25 1 8 to 60 nucleotides. 

6. The method of daim 2 vdierein said st^ of cleaving includes cleaving said anq)licon 
across said word by a type Us restrictioai endonuclease. 

30 7. The method of claim 2 wherein said word has a length of four and ^^herein said 
oligonucleotide tag has a length in the range of from 18 to 40. 

8. A repertoire of oUgonucleotide tags, Mdierein the oligormcleotide tags of the repertoire 
are of the form: 

35 

wi(N)xiW2(N)x2 . • (N)xn.lWn 
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wheiem each of throiig^ is a word consisting of an oligonucleotide having a lengdi from 
three to fourteen nucleotides or basepairs and being selected from the same minimally cross- 
hybridizing set vs^erein a word of die set and a complement of any other word of the set has at 
least two mismatches; N is a nucleotide or basepair: each of X] through Xn.] is an integer 
5 selected from the group consisting of 0, 1, 2, 3, and 4, provided that at least one of through 
Xjj_i is 1,2, 3, or 4; andn is an int^r in the range of from 4 to 10. 

9. The repertoire of claim 8 \^rein each of said xjthrough Xjj.) is selected fit)m the 
group consisting of 0, 1, and 2, and wfcerem said length of said word is fixjm four to trai 

1 0 nucleotides or basq)airs. 

1 0. The repertoire of claim 9 wherein said oligonucleotide tags are single stranded and 
wherein n is in the range of from 6 to 10. 

15 11. The repealoire of claim 10 wherein a duplex between each of said words of said 

minimally cross4iybridizing set and said complement of any other word of said set would have 
at least three mismatches. 

12. The repertoire of claim 1 1 wherein a duplex b^ween each of said words of said 
20 minimally cross-hybridizing set and said complanoit of any other word of said set would have 
at least five mismatdies vrfienever said word has a length of greater than or equal to six 
nucleotides. 



13. The repertoire of claim 10 having a number of said oligonucleotide tags that is in the 
25 range of from 100 to 1x10^. 

14. The repertoire of claim 13 having a number of said ohgonucleotide tags that is in the 
range of from 1000 to 1 x 10^. 

30 15. A repertoire of cloning vectors for attaching oligonucleotide tags to polynucleotides, 
wherein each of the vectors conqprises a double stranded element corresponding to an 
oligonucleotide tag of the form: 



wi(N)xiW2(N)x2 ... (NWiWn 

35 

vy^ierein each of W| throu^ w„ is a word consisting of an oligonucleotide having a length from 
three to fourtera nucleotides and being selected from the same minimally cross-hybridizing set 
y^ierein a word of the set and a complement of any other word of the set has at least two 
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mismatches; N is a nucleotide; each of X] through 74^.1 is an integer sdected from the group 
consisting of 0, L 2, 3, and 4, provided that at least one of X| ^oughx^.] is 1, 2, 3, or 4; and 
n is an integer in the range of from 4 to 10. 



5 16. The repertoire of claim 15 wherein each of said xithrough x^.] is selected from the 
group consisting of 0, 1, and 2, and wherein said length of said word is fix)m fr)ur to ten 
nucleotides or basepairs. 



-32- 



wo 00/20639 



1/3 



PCT/US99/22585 



102 104 107 108 109 112 




Excbei'qieptoBP8(132) 



Fig. lA 



wo 00/20639 



PCT/US99/22585 



2/3 



152 155 156 157 

\ 




DtoestnRhp.aDdp. 

(le) 



164 



^ cD-"-" 



wftb aod 





178 , 180 



|^lBsertaiidlJBate(182) 
Exdse repeptolre (186) 

Fig. IB 



wo 00/20639 



3/3 



PCTAJS99/22585 



202 204 208 210 212 216 




218 I 220 



225 226 



VHseA wfUi p. and f . 
(224) 




&cise reperteipe (236) 

Fig. 2 



INTERNATIONAL SEARCH REPORT 



IntBniitkmml appticalioa No. 
PCTAJS99y22585 



A. CrLASSmCATION F SUBJECT MATTER 

IPC(7) K:12Q 1/68; CUP 19/34; C07H 19/DO, 21/00, 21/02, 2M)4 

US CL :43S/6. 91.1. 91.2; 536«,1. 23.1, 25J, 2531. 25 J2 
Acocndmg to htrmtlHitti PMMt Cbnificatim QPC) or to bodi satkml chttififiatina nd IPC 


a FIELDS SEARCHED 


Mffliflutfli doGttmcatitiaa smioM (cUssificatiott tystem followed by cfttiiftrrtfna fymbols) 
U^. : 435/5.91.1.91^53602.1.23.1.253.2531.2532 


DorniBf tittin iMichnd othfff thf liiti**"— A»«i«M»tt« tn the grtaat that «ich docuaicati «o mdnded 


IB the fields aeatched 


Electwic d 
STN 

•eardi ten 


ita baaa eouOted dsiiag tbe mtomatiattal search (oaae of data base aad. when pmetkable. 
Bs: otigoaBcleotidea. taga» ampbcoo, amplificatioa, cBdoavdeaae. rector 


search tetras used) 


a DOCUMENTS CONSIDERED TO BE RELEVANT 


Catogofy* 


Cftatioo of docameat with tndicatkm, where app 


ropriato. of the felevant patitges 


Retevaat to chum No. 


Y 


EP 0 292 128 AI (TAMIR BIOTECHNOLOGIES LTD) 23 
November 1988, see endie document. 


1-16 


Y 


BRENNER et al. Encoded Combinatorial Chemistry. Proc. Natl. 
Acad. Sd., USA. June 1992, VoL 89, pages 5381-5383, see entire 
documenL 


1-16 


Y 


WO 93/06121 Al (AFFYMAX, TECHNOLOGIES N.V.) 01 AptH 
1993, see entire document. 


1-16 


[ [ Fuitiier documents are listed ia the conttnnatioa of Box C. 


{ 1 See patent fitmily annex. 




* Sptdd caUgcoM of cifd iluuuMMii: 

*A* ihn^M*'*'>«*»«Tttf»«g«M^«*i*"offlM»twfakfabiiotceitt^^ 
to bo of pvtieufar nimoc* 

•B" ggrtiM^ <toamwt pnbSAod CO cr iftir th» mlWTMt^^ 

*L* <lociM«iA wfaieh mqr Obov doubte on pnocify ddaCi) or whiA ■ 
cslad to Mlablbh lb* puMkTion dite of lootfMr cilitic» or oOmt 
tpoeiil r—oo (m lycifiod) 

•O' dociiwit rafcrring to an orml dbcloaura. umw ciihihition or oChar 
*P* <fe««itpabtidMd|mtotboiiilmitk»aini^ 


*T* tiM iliiiMiia niJilMiaiT aflii ITii iutwiilionil fiting ilm rrrrf" ^ 
<bla Hid not in eonflk* with tlM qipUcidtMio 
tfia pnMOfda or tfiaofy undwlyny ba ■nrcBtexi 

"X* docanoBt of pvtictibr niaraaco; tha ebimed nrctiliao cniiot bo 
ooswidMW mal or cniol bo ooendeied to iovolTO a mveolfva 1^ 
wbaa ttio doooBMt it tikn aloao 

"Y* docomMt ot portkalff ralamoo; tha eUmad imvantiott cudo* bo 
oaaaidmd to iaroiro an ■BFcntnra itap vbaa ttio downwnt ia 
coMbiDad widi ooa or aore oCbv such docnmaoti* rach cooibiimioo 
bong obvtoua to ■ pataoa ddDed 03 Iba art 

'A* doowcol BOftbor of Ibo ma potest tmiif 


Date of the actual completioa of the mtematioaal search 
15 JANUARY 2000 


Date of mailing of the intematiooal search report 


Name and mailing address of the ISAAJS 

Cflmmi™^ «f •nd Traiknufka 

BoxPCT 

WashiDglOD. D.C 20231 
Facsimile No. (703) 305-3230 


Autiionzed officer 1 ^ /I 

JEZIA RILEY V^V^^^ 
Telephone No. (703) 30^196 Jc"^ 



Fonn PCT/ISAy210 (second shcctXJuly 1992)* 



